Garbage output on parsing law pdf files.

 I have been trying to extract text from some court case judgement files. 

```python
from PyPDF2 import PdfFileReader

with open('17343_2008_Order_09-Jan-2019.pdf', 'rb') as fd:
  pdf = PdfFileReader(fd)
  p1 = pdf.getPage(2)
  print(p1.extractText()) 
```
the file 
[17343_2008_Order_09-Jan-2019.pdf](https://github.com/mstamy2/PyPDF2/files/3796761/17343_2008_Order_09-Jan-2019.pdf)


actual output 
```
$ !#--.&!* #* #(*!*$ !0!#/+(+)+))!(-!
%!3#%*$+$ !)#-$$ #$$ !
,(-,*!($+--.%%!*,($ !#3%,-.1$.%#1),!1*0!#%!
+)$ !6,!0$ #$,$0+.1*'!%!#&+(#'1!$+/%!&."!
$ #$$ !0!#/+(+)+))!(-!C#&&,0#&
#6#,1#'1!0#&'7#--.&!*$+
-+"",$$ !$ !%!1.-$#(-!+)$ !
$+#--!/$$ !/%!&!(-!+)$ !#--.&!*
#$$ !&/+$#(*+.%#--!/$#(-!+)$ !&#,*
),(*,(3-#(%!#&+(#'171!#*$+#-+(-1.&,+($ #$
$ !,(-,*!($ #*+--.%%!*0,$ +.$#(7/%!"!*,$#$,+(
+($ !/#%$+)$ !*!#$ $++9/1#-!
)+.%*#7&#)$!%$ !#11!3!*$ !&!
-,%-."&$#(-!&8,)+(!1!#%(!*2.*3!+)$ ,&
 #*$#9!($ !6,!0$ #$)+.%$ !F-!/$,+($+
,&#$$%#-$!*80!#%!+)$ !6,!0$ #$$ !
&#"!& +.1*-+""!(*$+.&)+%#--!/$#(-!,($ !
+)!F-!/$,+(
ˇ$+B
 +",-,*!,&(+$".%*!%,),$
,&-+"",$$!*0,$ +.$/%!"!*,$#$,+(,(
#&.**!(),3 $,($ ! !#$+)/#&&,+(
./+(#&.**!(<.#%%!1#(*0,$ +.$$ !
+))!(*!% #6,(3$#9!(.(*.!#*6#($#3!
```
expected output

```
IN THE SUPREME COURT OF INDIA
CRIMINAL APPELLATE JURISDICTION
CRIMINAL APPEAL  NO(S).  2094/2008
AJIT SINGH ...APPELLANT(S) VERSUS
THE STATE OF PUNJAB ...RESPONDENT(S)
ORDER 
1. The matter has been referred to this
Bench due to a difference of opinion between the
two learned judges of this Court who had heard the
appeal; one learned judge holding the offence to be
one under Section 304 Part I IPC and the second
learned judge holding the said offence to be one
covered by Section 302 IPC.
2. It appears that following the aforesaid
order the accused has been released from custody in
September, 2011 on the strength of a warrant of
release issued by the jurisdictional Sessions
Judge.  Though we fail to understand how the
accused could have been released, pending a
resolution of the difference of opinion between the
two learned judges of this Court, we are not
inclined to go into the said issue and instead deem
it appropriate to go into the core issue arising.
```
I tried different pages but the problem persists.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Garbage output on parsing law pdf files. #523

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Garbage output on parsing law pdf files. #523

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions