HARPA.AI
LIBRARYAPIGUIDESAI COMMANDSBLOG

🧩  Twitter Thread Extraction

Extracts the original Tweet and all replies from Twitter posts. Use this command on a single Tweet page. #extraction

Created by Adrian Larsson
Updated on Nov 9, 2024 04:32
Installed 173 times
RUNS JS CODE

How to Use

IMPORT COMMAND

Content

- type: say
  message: >-
    📌 Use this command on a single Tweet page. You can use this command or JS
    code as a base for creating other commands or automations.
- type: ask
  param: targetMessageCount
  message: How many replies would you like to extract?
  options:
    - label: 50 comments
      value: 50
    - label: 100 comments
      value: 100
    - label: 200 comments
      value: 200
    - $custom
  default: ''
  vision:
    enabled: false
    mode: area
    send: true
    hint: ''
  optionsInvalid: false
- type: js
  code: |2-
      async function scrollAndCollectTweet (targetMessageCount) {
        const config = {
          tweet: 'article[data-testid="tweet"]',
          author: 'div[data-testid="User-Name"] span:first-child',
          authorHandle: '@',
          tweetText: 'div[data-testid="tweetText"]',
          timestamp: 'time',
          stats: 'div[aria-label*="replies"]',
        }

        function findTwitterHandle (element) {
          const xpath = './/span[starts-with(text(), "@")]'
          const result = document.evaluate(
            xpath,
            element,
            null,
            XPathResult.FIRST_ORDERED_NODE_TYPE,
            null
          )
          return result.singleNodeValue?.textContent || ''
        }

        async function scrollAndCollectComments () {
          const comments = []
          const uniqueComments = new Set()
          let retries = 0
          const maxRetries = 5

          async function scrollAndWait () {
            window.scrollTo(0, document.body.scrollHeight)
            await new Promise(resolve => setTimeout(resolve, 500))
          }

          function extractCurrentComments () {
            const commentElements = Array.from(
              document.querySelectorAll(config.tweet)
            ).slice(1)
            let newCommentsFound = false

            commentElements.forEach(comment => {
              const author =
                comment.querySelector(config.author)?.innerText || 'Unknown'
              const time = comment.querySelector(config.timestamp)?.innerText || ''
              const content =
                comment.querySelector(config.tweetText)?.innerText || ''
              const userHandle = findTwitterHandle(comment)

              const commentId = `${time}-${author}-${content}`

              if (!uniqueComments.has(commentId) && author && time && content) {
                const formattedComment = `${time}. ${author} (${userHandle}): ${content}`
                uniqueComments.add(commentId)
                comments.push(formattedComment)
                newCommentsFound = true
              }
            })

            return newCommentsFound
          }

          while (comments.length < targetMessageCount && retries < maxRetries) {
            const newCommentsFound = extractCurrentComments()

            if (comments.length < targetMessageCount) {
              await scrollAndWait()
              if (!newCommentsFound) {
                retries++
              } else {
                retries = 0
              }
            }
          }

          if (comments.length > targetMessageCount) {
            comments.splice(targetMessageCount)
          }

          return comments
        }

        try {
          const result = {
            post: {
              author: document.querySelector(config.author)?.innerText || '',
              username: '@' + window.location.pathname.split('/')[1],
              content: document.querySelector(config.tweetText)?.innerText || '',
              timestamp: document.querySelector(config.timestamp)?.innerText || '',
              stats:
                document.querySelector(config.stats)?.getAttribute('aria-label') ||
                '',
            },
            comments: await scrollAndCollectComments(),
            url: window.location.href
          }

          window.scrollTo(0, 0)

          return result
        } catch (error) {
          window.scrollTo(0, 0)
          return null
        }
      }

      return scrollAndCollectTweet(targetMessageCount)
  param: array
  timeout: 15000
  args: targetMessageCount
  silent: true
- type: say
  message: |-
    **Data Array:**

    {{array}}
Notice: Please read before using

This automation command is created by a community member. HARPA AI team does not audit community commands.

Please review the command carefully and only install if you trust the creator.

Contact us
HomeUse CasesGuidesPrivacy PolicyTerms of Service
CAN WE STORE COOKIES?
Our website uses cookies for the purposes of accessibility and security. They also allow us to gather statistics in order to improve the website for you. More info: Privacy Policy