Python：读取特定行（小文件、重复文件、大型文件的不同解决方案）_iiHeys的博客

link之家
链接快照平台
输入网页链接，自动生成快照
标签化管理网页链接
For reading small files

对于小文件的快速解决办法：
Use fileobject.readlines() or for line in fileobject as a quick solution for small files.
f = open('filename')
lines=f.readlines()
print lines[25]
print lines[29]
lines = [25, 29]
i = 0
f = open('filename')
for line in f:
    if i in lines:
        print i
    i += 1
For reading many files, possible repeatedly
 
使用linecache是一个更优雅的解决方案，它可以快速读取许多文件，甚至可以重复读取。
 There is a more elegant solution for extracting many lines: linecache 
import linecache
linecache.getline('/etc/passwd', 4)
'sys:x:3:3:sys:/dev:/bin/sh\n'
将4改为想要的行号，就可以了。请注意，由于计数是从零开始的，所以第4行是第5行。 
Change the 4 to your desired line number, and you’re on. Note that 4 would bring the fifth line as the count is zero-based. 
For large files which won’t fit into memory
 
当文件非常大，而且无法放入内存时，用enumerate()。注意，使用此方法可能会变慢，因为文件是按顺序读取的。
 If the file to read is big, and cause problems when read into memory or you don’t want to read the whole file in memory at once, it might be a good idea to use enumerate(): 
fp = open("file")
for i, line in enumerate(fp):
    if i == 25:
        # 26th line
    elif i == 29:
        # 30th line
    elif i > 29:
        break
fp.close()
Note that i == n-1 for the nth line. 
In Python 2.6 or later: 
with open("file") as fp:
    for i, line in enumerate(fp):
        if i == 25:
            # 26th line
        elif i == 29:
            # 30th line
        elif i > 29:
            break
整理并翻译自：stackoverflow 
 https://stackoverflow.com/questions/2081836/reading-specific-lines-only?answertab=active#tab-top 
                    问题描述当使用for循环读取文件时，在某些情况下，我们只想读取特定的行，比如第26行和第30行，对于不同的情况，有3个内置特性可以实现这个目标。When using a for loop to read a file, in some cases we only want to read specific lines, say line #26 and #30, there are 3  bu...
				写python已经差不多有三个多月了，因为简洁，越来越喜欢这个"巨莽"了，我相信绝大多数人同样喜欢简洁。
今天第一次记录，是我刚刚再工作上遇到的一个小问题，为了更方便理解，我把问题概括成这样：
我有三百多万条记录，但是里面有重复（里面由数字和数字组成），我想要得到不重复的数据。
				　file = open("test.txt", encoding="utf8")　　　　#文档以utf8编码读取，不然默认gbk，中文会出现乱码
　data = file. read()
　data2 = file.read()
 print(data2) #结果为空，第一次读完指针就停留在末尾，第二次读接着上次的指针的位置，所以没有内容可以读取
默认打开是只读模式
fil...
				主要流程：读取文件数据——将每一行数据分成不同的字符段——在判断     在某个字否段是否含与某个字符。（只是其中一种办法）代码如下：with open(r"C:\Users\LENOVO\Desktop\20170513155231.txt", encoding='utf-8') as f:#从TXT文件中读出数据
    for line1 in f:
        list.append(...
import pandas as pd
df=pd.read_csv(r"C:\data\重复值处理\data1.csv",encoding='gbk',engine='python')
#不写engine='python'可能会出现OSError: Initializing from file failed。
1.找出重复值的位置
找出重复值的位...
```python
with open('filename.txt', 'r') as f:
    col2_list = [line.strip().split('\t')[1] for line in f]
print(col2_list)
其中，filename.txt为待读取的文件名，'\t'为文件中的分隔符，'[1]'表示获取第二列数据。