[知识积累]Python踩坑之旅其二裸用os.sytem的原罪

May 10, 2019


Python 踩坑之旅其一杀不死的 Shell 子进程

踩坑案例

今天的坑不仅包括裸用os.system还包括裸用相关的家族:

  • os.popen
  • subprocess家族
    • subprocess.call
    • subprocess.Popen
    • subprocess.run
  • commands家族 (py2.6后已不推荐使用, depreciated. Py3删除)
    • commands.getstatusoutput

这些坑是新同学非常容易踩,而且 code review 过程中容易漏掉:

[1] 长期运行 Service 中裸用以函数家族

  • 裸用以上 shell 执行家族可能出现script 执行 hang 住进而 hang 住逻辑执行线程,长时间积累进而占满所有执行线程而服务宕机情况
  • 大内存消耗 service fork 子进程直接执行 script
    • 如果该 script hang 住
    • 并且原进程内存进行频繁修改(或者堆积修改, 虽然有 Copy-On-Write技术),但由于内存巨大,依然有内存风险

[2] 自动化测试中裸用以上函数家族而不加以保护

  • 单个 case 如果执行 script 脚本 hang 住会导致 hang 掉整个case 集
  • 不设计 case 超时机制导致case 集合运行时间不可控

填坑解法

  1. 支持超时 kill 策略,禁止任何情况下的 shell 执行裸用家族函数
        from cup import shell
        shellexec = shell.ShellExec()
        # timeout=None will block the execution until it finishes
        shellexec.run('/bin/ls', timeout=None)
        # timeout>=0 will open non-blocking mode
        # The process will be killed if the cmd timeouts
        shellexec.run(cmd='/bin/ls', timeout=100)

实现代码关键:

  • 启动新的线程执行, 而不是在 main thread 执行
  • 强制超时机制支持, 一旦超时启动 terminate 或者 kill 策略

实现示例可以参考 ShellExec类的run函数:

  • https://github.com/baidu/CUP/blob/master/cup/shell/oper.py
  • 截取部分关键代码如下:

      def run(self, cmd, timeout)
            def _pipe_asshell(cmd):
              """
              run shell with subprocess.Popen
              """
              tempscript = tempfile.NamedTemporaryFile(
    
                  dir=self._tmpdir, prefix=self._tmpprefix,
                  delete=True
              )
              with open(tempscript.name, 'w+b') as fhandle:
                  fhandle.write('cd {0};\n'.format(os.getcwd()))
                  fhandle.write(cmd)
              shexe = self.which('sh')
              cmds = [shexe, tempscript.name]
              log.info(
                  'cup shell execute {0} with script {1}'.format(
                      cmd, cmds)
              )
              self._subpro = subprocess.Popen(
                  cmds, stdout=subprocess.PIPE,
                  stderr=subprocess.PIPE, preexec_fn=_signal_handle
              )
              self._subpro_data = self._subpro.communicate()
          ret = {
              'stdout': None,
              'stderr': None,
              'returncode': 0
          }
          cmdthd = threading.Thread(
              target=_pipe_asshell, args=(cmd, )
          )
          cmdthd.start()
          cmdthd.join(timeout)
          if cmdthd.isAlive():
              str_warn = (
                  'Shell "%s"execution timout:%d. Killed it' % (cmd, timeout)
              )
              warnings.warn(str_warn, RuntimeWarning)
              parent = linux.Process(self._subpro.pid)
              for child in parent.children(True):
                  os.kill(child, signal.SIGKILL)
              ret['returncode'] = 999
              ret['stderr'] = str_warn
              self._subpro.terminate()
          else:
              self._subpro.wait()
              times = 0
              while self._subpro.returncode is None and times < 10:
                  time.sleep(1)
                  times += 1
              ret['returncode'] = self._subpro.returncode
              assert type(self._subpro_data) == tuple, \
                  'self._subpro_data should be a tuple'
              ret['stdout'] = self._subpro_data[0]
              ret['stderr'] = self._subpro_data[1]
          return ret
    
    
  1. 内存消耗型服务/进程, 长期运行服务进程避免fork 进程执行 shell 命令
  • 分离相应的脚本执行单元到其他附属服务
  • 若不分离,让该长期运行进程保持低内存消耗设计

坑位分析

建议看下第二章节关于进程和子进程继承类信息,script使用上述家族进行执行时,采用了启动一个子进程的方式

技术关键字

  • os.system家族
  • subprocess家族

填坑总结

Shell执行是个非常常见的操作,所以很多同学特别是新同学,在使用过程中经常不注意而随意使用。 裸用一时爽,进程死亡火葬场

前坑回顾

Linux中, 子进程拷贝父进程哪些信息

  • 先说与父进程不同的
    • pid, ppid
    • memory locks
    • tms_utime、tms_stime、tms_cutime、tms_ustime
    • pending signals
    • semaphore adjustments
    • file lock
    • pending alarms

参考资料来源:

  • Linux Programmer’s Manual ( man fork )
    • CentOS release 6.3 (Final)
    • Linux Kernel 2.6.32

fork()  creates a new process by duplicating the calling process.  The new process, referred to as the child, is an exact duplicate of the calling process, referred to as the parent, except for the follow-
ing points:

    *  The child has its own unique process ID, and this PID does not match the ID of any existing process group (setpgid(2)).

    *  The child's parent process ID is the same as the parent's process ID.

    *  The child does not inherit its parent's memory locks (mlock(2), mlockall(2)).

    *  Process resource utilizations (getrusage(2)) and CPU time counters (times(2)) are reset to zero in the child.

    *  The child's set of pending signals is initially empty (sigpending(2)).

    *  The child does not inherit semaphore adjustments from its parent (semop(2)).

    *  The child does not inherit record locks from its parent (fcntl(2)).

    *  The child does not inherit timers from its parent (setitimer(2), alarm(2), timer_create(2)).

    *  The child does not inherit outstanding asynchronous I/O operations from its parent (aio_read(3), aio_write(3)), nor does it inherit any asynchronous I/O contexts from its parent (seeio_setup(2)).

       The process attributes in the preceding list are all specified in POSIX.1-2001.  The parent and child also differ with respect to the following Linux-specific process attributes:

    *  The child does not inherit directory change notifications (dnotify) from its parent (see the description of F_NOTIFY in fcntl(2)).

    *  The prctl(2) PR_SET_PDEATHSIG setting is reset so that the child does not receive a signal when its parent terminates.

    *  Memory mappings that have been marked with the madvise(2) MADV_DONTFORK flag are not inherited across a fork().

    *  The termination signal of the child is always SIGCHLD (see clone(2)).

在说继承、拷贝父进程的

  • 包括
    • 内部数据空间
    • 堆栈
    • 用户 ID、组 ID、eid 有效用户 id、有效组 id、用户 id 标志和设置组 id 标志
    • 进程组 id
    • 会话 id
    • 终端
    • 当前目录、根目录
    • 文件模式屏蔽字
    • 信号屏蔽设置
    • 打开文件描述符
    • 环境
    • 共享存储段
    • 存储映射
    • 资源限制

此外

Agent常驻进程选择>60000端口的意义

在 Linux 系统中, 一般系统会自动替程序选择端口连接到用户指定的目的端口, 而这个端口范围是提前设定好的, 比如作者的 centos:

$ cat /proc/sys/net/ipv4/ip_local_port_range
10000   60000

首发公众号: 微信扫一扫关注公众号