-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflowFrom a users perspective, text extraction is the affected feature/workflow
Description
I have been trying to extract text from some court case judgement files.
from PyPDF2 import PdfFileReader
with open('17343_2008_Order_09-Jan-2019.pdf', 'rb') as fd:
pdf = PdfFileReader(fd)
p1 = pdf.getPage(2)
print(p1.extractText()) the file
17343_2008_Order_09-Jan-2019.pdf
actual output
$ !#--.&!* #* #(*!*$ !0!#/+(+)+))!(-!
%!3#%*$+$ !)#-$$ #$$ !
,(-,*!($+--.%%!*,($ !#3%,-.1$.%#1),!1*0!#%!
+)$ !6,!0$ #$,$0+.1*'!%!#&+(#'1!$+/%!&."!
$ #$$ !0!#/+(+)+))!(-!C#&&,0#&
#6#,1#'1!0#&'7#--.&!*$+
-+"",$$ !$ !%!1.-$#(-!+)$ !
$+#--!/$$ !/%!&!(-!+)$ !#--.&!*
#$$ !&/+$#(*+.%#--!/$#(-!+)$ !&#,*
),(*,(3-#(%!#&+(#'171!#*$+#-+(-1.&,+($ #$
$ !,(-,*!($ #*+--.%%!*0,$ +.$#(7/%!"!*,$#$,+(
+($ !/#%$+)$ !*!#$ $++9/1#-!
)+.%*#7&#)$!%$ !#11!3!*$ !&!
-,%-."&$#(-!&8,)+(!1!#%(!*2.*3!+)$ ,&
#*$#9!($ !6,!0$ #$)+.%$ !F-!/$,+($+
,&#$$%#-$!*80!#%!+)$ !6,!0$ #$$ !
&#"!& +.1*-+""!(*$+.&)+%#--!/$#(-!,($ !
+)!F-!/$,+(
ˇ$+B
+",-,*!,&(+$".%*!%,),$
,&-+"",$$!*0,$ +.$/%!"!*,$#$,+(,(
#&.**!(),3 $,($ ! !#$+)/#&&,+(
./+(#&.**!(<.#%%!1#(*0,$ +.$$ !
+))!(*!% #6,(3$#9!(.(*.!#*6#($#3!
expected output
IN THE SUPREME COURT OF INDIA
CRIMINAL APPELLATE JURISDICTION
CRIMINAL APPEAL NO(S). 2094/2008
AJIT SINGH ...APPELLANT(S) VERSUS
THE STATE OF PUNJAB ...RESPONDENT(S)
ORDER
1. The matter has been referred to this
Bench due to a difference of opinion between the
two learned judges of this Court who had heard the
appeal; one learned judge holding the offence to be
one under Section 304 Part I IPC and the second
learned judge holding the said offence to be one
covered by Section 302 IPC.
2. It appears that following the aforesaid
order the accused has been released from custody in
September, 2011 on the strength of a warrant of
release issued by the jurisdictional Sessions
Judge. Though we fail to understand how the
accused could have been released, pending a
resolution of the difference of opinion between the
two learned judges of this Court, we are not
inclined to go into the said issue and instead deem
it appropriate to go into the core issue arising.
I tried different pages but the problem persists.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflowFrom a users perspective, text extraction is the affected feature/workflow