Questions tagged [python]
Use for data science questions related to the programming language Python. Not intended for general coding questions (which should be asked on Stack Overflow).
6,617 questions
4
votes
1
answer
467
views
Handling Profanity Censorship in BERTopic
I'm currently working in a dataset with censorship on profanity. Basically, fuck would be 4 heart emojis. Considering I'm trying to run a topic modelling w/ BERTopic, what kinda of preprocessing would ...
0
votes
0
answers
11
views
What causes a model to have such an output?
I'm training CSDI model and the output is very suspicious?
Low diffusion steps? too high learning rate? nothing seems to change this behavior? some normalization issue that I'm not accounting for, I'...
5
votes
1
answer
182
views
How to efficiently merge multiple CSV and JSON files into a single DataFrame using Pandas in Python
I am working with multiple data files in a folder where some files are in CSV format and others are in JSON format. I want to combine all of them into a single DataFrame for further analysis.
Here is ...
2
votes
0
answers
15
views
Trying to understand if I'm implementing GluonTS sliding window splitting and validation set correctly
The documentation is a little bit confusing so I thought I would ask here to make sure, I'm using:
...
5
votes
1
answer
54
views
Model's forecasts are not anchored correctly to the history
I'm implementing this paper
and trying to train it on a generated data and return full ground truths and a single forecast but the forecasts my model is producing are not anchored to the past series ...
7
votes
4
answers
336
views
Calculating next row in binary matrix
if I have the binary matrix which looks something like this (this is only 10 rows of binary matrix, I have a dataset of a million rows, so you can see what the binary matrix looks like):
...
2
votes
0
answers
44
views
GAN training (RGB to IR) High-quality results on training set, but blurry/hallucinated outputs on unseen test images
I am a 6th sem student and my mini project is IR pedestrian detection using yolo v8. My job is to train GAN(Pix2Pix-based) for the generation of synthetic IR images. I will give my code below. My 99th ...
5
votes
0
answers
30
views
Unable to predict values for test data
I have build and trained a NMT model using Rnn in Google colab and Now when I am trying to predict for my test data my Google colab session keeps on crashing . The shape of my test data is 47838×55
...
6
votes
1
answer
129
views
When attempting to maximize F1 score for a decision tree on test data using cost-complexity pruning why is it yielding the fully grown tree?
I'm learning about classification using decision trees. I'm using DecisionTreeClassifier function in the scikit-learn library in Python to train the model on training data (yields fully grown tree), ...
2
votes
0
answers
32
views
graph analysis for 2-dimensional edges for Natural Language Processing
Given a text resource (Corpus/novel/...) I want to find pair of words that 1) appear statistically significantly together and 2) extract contextual knowledge from these pairs. For simplicity I'm ...
5
votes
1
answer
120
views
Unable to run pandas/modin[ray] code on sagemaker unified studio
I am working on a movie recommendation problem where I get multiple files from the source, and the total data size is around 900 MB. I am using the ...
5
votes
1
answer
80
views
Python Datascience on windows OS behind a proxy impossible to cross: which solution do you use?
In an structure, IT has deployed a strict proxy policy (no specific right for any people).
Windows 11 is the OS installed for every people, in a strict way.
To run datascience tasks using python, in ...
1
vote
0
answers
20
views
Unexpected Feature Importance Pattern in Random Forest Classification of MNIST Digits 0 and 1
I performed Random Forest–based feature importance analysis on the MNIST dataset, focusing only on digits 0 and 1.
When I visualize the importance map (see image below), it doesn’t resemble the ...
2
votes
0
answers
27
views
How can I group transcribed phrases into meaningful chunks without using complex models?
I have a large set of phrases obtained via Azure Fast Transcription, and I need to group them into coherent semantic chunks (to use later in a RAG pipeline).
Initially, I tried grouping phrases based ...
0
votes
0
answers
26
views
How to extract my fingerprint from my laptop's finger sensor
So like I have a bunch of fingerprint as a data set (my college gave me). Now I want to use these fingerprint as datasets and train a model to understand the different things. That is beside the point....