Computer Science And Technology - Research Publications
Permanent URI for this collectionhttps://kr.cup.edu.in/handle/32116/82
Browse
9 results
Search Results
Item An Empirical Study on Detection of Android Adware Using Machine Learning Techniques(Springer, 2023-10-06T00:00:00) Farooq, Umar; Khurana, Surinder Singh; Singh, Parvinder; Kumar, MunishThe Android operating system, without showing signs of diminishing, has experienced unprecedented popularity and continues to thrive with a significant user base. Its notable aspect for supporting third-party applications has revolutionized the digital landscape, allowing developers to generate revenue through advertising. Adware has emerged as a prominent monetization method for developers of both Adware and the applications that integrate it. However, as the utilization of Adware proliferates, it simultaneously escalates the risk of fraudulent activities associated with advertising approaches. The increasing prevalence of Adware introduces a pressing need for robust detection and mitigation strategies to address the potentially detrimental effects of fraudulent practices. In response, the proposed system focuses on analyzing and identifying alterations in network traffic acquired from Android devices. This research delves into an extensive exploration of machine and deep learning models, aiming to enhance the detection and mitigation of Adware. The exceptional capabilities of the LGBM model highlight the system's noteworthy performance in binary classification. However, in multiclass classification, the XGBM model emerges as the frontrunner, outperforming other models and showcasing superior effectiveness in distinguishing and classifying Adware and general Malware. These outcomes highlight the remarkable efficacy of the system in accurately classifying adware instances, regardless of the classification scenario. The findings not only validate the viability of the proposed system but also underscore the superior performance of specific machine learning models employed in the research. With further refinement and optimization, the system holds great promise in enhancing the security and integrity of the Android ecosystem. � 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.Item Sentiment analysis of Hindi language text: a critical review(Springer, 2023-11-11T00:00:00) Sidhu, Simran; Khurana, Surinder S.; Kumar, Munish; Singh, Parvinder; Bamber, Sukhvinder S.Sentiment analysis involves extracting sentiments from various forms of text, including customer reviews, tweets, blogs, and news clips expressing opinions on diverse subjects, even populist events. The advent of tools supporting regional languages has resulted in a substantial surge of regional language texts. As Hindi ranks fourth in terms of native speakers, the development of sentiment analysis mechanisms for Hindi text becomes crucial. This paper provides a comprehensive review of specific approaches used in Hindi sentiment analysis, encompassing negation handling and the evolution of SentiWordNet for the Hindi Language. Moreover, it offers an overview of available Hindi lexicons and insights into diverse stemmers and morphological analyzers designed for the language. Additionally, the paper conducts an in-depth literature review of various sentiment analysis tasks carried out in Hindi, thereby opening avenues for future research in sentiment analysis and opinion mining in the Hindi language. � 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.Item Detection of content-based cybercrime in Roman Kashmiri using ensemble learning(Springer, 2023-09-25T00:00:00) Farooq, Umar; Singh, Parvinder; Khurana, Surinder Singh; Kumar, MunishThe official language of Kashmir, Kashmiri language or Koshur, is spoken by more than 7 million people, yet its content-based cybercrime detection remains unexplored in theoretical and experimental research. Furthermore, the absence of programming libraries for sentimental analysis and a benchmark corpus has impeded advancements in this field. Challenges persist in working with diverse scripts of Kashmiri, including Perso-Arabic, Sharada, Devanagari, and Roman. Detecting cybercrime in this language is challenging due to its complex morphological nature, lack of resources, scarcity of annotated datasets, and varied linguistic characteristics, emphasizing the importance of overcoming these obstacles to develop effective detection systems. This paper attempts to detect content-based cybercrime in Roman Kashmiri script, extensively utilized on online platforms like social media, chat rooms, emails, etc., by the Kashmiri community. A well-balanced and meaningful dataset, the first of its kind in this context, is compiled, incorporating positive and negative comments, and three strategies were employed for analysis. The findings reveal that the Tf-Idf Vectorizer outperforms other tokenization methods (Count Vectorizer and Tf-Idf Transformer), bi-gram notation exhibits superior performance compared to one and tri-gram notations, and the XGBM proves to be the most effective in terms of evaluation metrics. Leveraging these strategies, Python applications were developed for text classification, successfully distinguishing cyberbullying (unsafe) from non-cyberbullying (safe) instances, with the XGBM exhibiting exceptional accuracy using the Tf-Idf Vectorizer with bi-gram, a Bag of Words, and lexical features. This pioneering research underscores the urgent need for content-based cybercrime detection advancements in the Kashmiri language, paving the way for effective detection systems to address language-specific challenges and promote a safer online environment for the Kashmiri community. Furthermore, this research opens new avenues for further advancements in detecting and preventing cybercrime in Kashmiri and potentially in other languages lacking robust cybercrime detection methodologies. � 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.Item CottonLeafNet: cotton plant leaf disease detection using deep neural networks(Springer, 2023-03-18T00:00:00) Singh, Paramjeet; Singh, Parvinder; Farooq, Umar; Khurana, Surinder Singh; Verma, Jitendra Kumar; Kumar, MunishIndia is a cover crop region whereby agricultural production sustains a substantial proportion of the populace and upon which the whole Indian economy is heavily reliant. As per research, it provides subsistence for around 70% of rural households. In terms of agricultural output and exports, India ranks second and ninth, respectively. However, it accomplishes the first position globally in terms of cotton exports thereby adequately contributing to the economy of the country. However, it has been documented that various crops especially cotton plants are severely harmed by various pests, extreme climatic variations, nutrient inadequacy and toxicity, and so on. Cotton plant diseases cause a wide range of illnesses ranging from bacterial to nutritional deficiency giving a hard time for the human eye to recognize. However, most of the researchers have considered only a few types of cotton leaf diseases and excluded many. Keeping these constraints in consideration, this research seeks to aid the detection of these diseases by employing deep learning paradigms. The research begins with acquiring a near-balanced dataset with 22 leaf disease types including bacterial, fungal, viral, nutrient deficiency, etc. followed by data augmentation to boost the performance of the models. Many algorithms were tested, however, CNN happens to be very efficient and productive. The proposed model when evaluated on the test set achieves an accuracy of 99.39% with a negligible error rate, thus outperforming all the existing approaches by consuming less computational time. The outcome portrays that the proposed approach has the efficiency to be implemented in real-time detection systems to aid the precise detection of cotton leaf diseases to help the farmers in taking appropriate actions. � 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.Item Recognition of offline handwritten Urdu characters using RNN and LSTM models(Springer, 2022-06-17T00:00:00) Misgar, Muzafar Mehraj; Mushtaq, Faisel; Khurana, Surinder Singh; Kumar, MunishOptical Character Recognition (OCR), helps to convert different types of scanned documents, such as images into searchable and editable content. OCR is language dependant and very limited research has been carried out in this field for Urdu and Urdu like scriptures (E.g. Farsi, Arabic, and Urdu) unlike other languages like English, Hindi, etc. The lack of research work is attributed to a lack of publically available benchmark databases and inherent complexities involved in these languages like cursive nature and change in the shape of a character depending upon its position in a ligature. Each character has 2�4 different shapes depending upon its position in the word; initial, medial, or final. In this article, the we have proposed a methodology to automate the data collection process and collected a large handwritten dataset of 110,785 Urdu characters and laid out the comaparative analysis of two deep learning models SimpleRNN and LSTM to showcase the potential of RNN models for chararacter recognition. Data was collected from 250 authors on the A4 size sheet. Each sheet contains 132 shapes for Urdu characters and 10 numerals. As far as the authors know, this is the first time that such a large dataset has been proposed which contains all the possible shapes of Urdu character numerals as well. Experimentation has been done for the numeral, full characters, and for whole data set separately to lay a comparative analysis of classification capabilities of RNN and LSTM models. Despite of such inherit complexities in Urdu script, the RNN and LSTM models proved to be more effective in achieving a high accuracy rates. Respective accuracy for RNN achieved for each category are: 96.96% for numerals, 85.22% for full characters and 73.62% for whole data and LSTM outperforms the prior one with max accuracy for each category of data as 97.80% for numerals, 97.43% for full characters and 91.30% for whole data. Besides, the proposed dataset opens a new window for future research, showcasing the huge potential of this dataset for data analysis not only for Urdu language but for other languages like Arabic, Persian,etc. which uses similar kind of character sets. � 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.Item An efficient approach for copy-move image forgery detection using convolution neural network(Springer, 2022-02-17T00:00:00) Koul, Saboor; Kumar, Munish; Khurana, Surinder Singh; Mushtaq, Faisel; Kumar, KrishanDigital imaging has become elementary in this novel era of technology with unconventional image forging techniques and tools. Since, we understand that digital image forgery is possible, it cannot be even presented as a piece of evidence anywhere. Dissecting this fact, we must dig unfathomable into the issue to help alleviate such derelictions. Copy-move and splicing of images to create a forged one prevail in this monarchy of digitalization. Copy-move involves copying one part of the image and pasting it to another part of the image while the latter involves merging of two images to significantly change the original image and create a new forged one. In this article, a novel slant using a convolutional neural network (CNN) has been proposed for automatic detection of copy-move forgery detection. For the experimental work, a benchmark dataset namely, MICC-F2000 is considered which consists of 2000 images in which 1300 are original and 700 are forged. The experimental results depict that the proposed model outperforms the other traditional methods for copy-move forgery detection. The results of copy-move forgery were highly promising with an accuracy of 97.52% which is 2.52% higher than the existing methods. � 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.Item Semi-supervised labeling: a proposed methodology for labeling the twitter datasets(Springer, 2022-01-28T00:00:00) Jan, Tabassum Gull; Khurana, Surinder Singh; Kumar, MunishTwitter has nowadays become a trending microblogging and social media platform for news and discussions. Since the dramatic increase in its platform has additionally set off a dramatic increase in spam utilization in this platform. For Supervised machine learning, one always finds a need to have a labeled dataset of Twitter. It is desirable to design a semi-supervised labeling technique for labeling newly prepared recent datasets. To prepare the labeled dataset lot of human affords are required. This issue has motivated us to propose an efficient approach for preparing labeled datasets so that time can be saved and human errors can be avoided. Our proposed approach relies on readily available features in real-time for better performance and wider applicability. This work aims at collecting the most recent tweets of a user using Twitter streaming and prepare a recent dataset of Twitter. Finally, a semi-supervised machine learning algorithm based on the self-training technique was designed for labeling the tweets. Semi-supervised support vector machine and semi-supervised decision tree classifiers were used as base classifiers in the self-training technique. Further, the authors have applied K means clustering algorithm to the tweets based on the tweet content. The principled novel approach is an ensemble of semi-supervised and unsupervised learning wherein it was found that semi-supervised algorithms are more accurate in prediction than unsupervised ones. To effectively assign the labels to the tweets, authors have implemented the concept of voting in this novel approach and the label pre-directed by the majority voting classifier is the actual label assigned to the tweet dataset. Maximum accuracy of 99.0% has been reported in this paper using a majority voting classifier for spam labeling. � 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.Item Face detection in still images under occlusion and non-uniform illumination(Springer, 2021-01-26T00:00:00) Kumar, Ashu; Kumar, Munish; Kaur, AmandeepFace detection is important part of face recognition system. In face recognition, face detection is taken not so seriously. Face detection is taken for granted; primarily focus is on face recognition. Also, many challenges associated with face detection, increases the value of TN (True Negative). A lot of work has been done in field of face recognition. But in field of face detection, especially with problems of face occlusion and non-uniform illumination, not so much work has been done. It directly affects the efficiency of applications linked with face detection, example face recognition, surveillance, etc. So, these reasons motivate us to do research in field of face detection, especially with problems of face occlusion and non-uniform illumination. The main objective of this article is to detect face in still image. Experimental work has been conducted on images having problem of face occlusion and non-uniform illumination. Experimental images have been taken from public dataset AR face dataset and Color FERET dataset. One manual dataset has also been created for experimental purpose. The images in this manual dataset have been taken from the internet. This involves making the machine intelligent enough to acquire the human perception and knowledge to detect, localize and recognize the face in an arbitrary image with the same ease as humans do it. This article proposes an efficient technique for face detection from still images under occlusion and non-uniform illumination. The authors have presented a face detection technique using a combination of YCbCr, HSV and L � a � b color model. The proposed technique improved results in terms of Accuracy, Detection Rate, False Detection Rate and Precision. This technique can be useful in the surveillance and security related applications. � 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature.Item UrduDeepNet: offline handwritten Urdu character recognition using deep neural network(Springer Science and Business Media Deutschland GmbH, 2021-06-07T00:00:00) Mushtaq, Faisel; Misgar, Muzafar Mehraj; Kumar, Munish; Khurana, Surinder SinghHandwritten Urdu character recognition system faces several challenges including the writer-dependent variations and non-availability of benchmark databases for cursive writing scripts. In this study, we propose a handwritten Urdu character dataset for Nasta�liq writing style covering isolated, positional characters as well as numerals. We also propose a convolutional neural network (CNN) architecture for the recognition of handwritten Urdu characters and numerals. CNN is a novel technique for image recognition that does not need explicit feature engineering and extraction and produces efficient results as compared to standard handcrafted feature extraction approaches. The proposed system was trained on a training dataset of 74, 285 samples and evaluated on a test dataset of 21, 223 samples and achieved a recognition rate of 98.82% for 133 classes, outperforming the results of all state-of-the-art systems for the Urdu language. � 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.