[Ssc-dev] simple question (I hope)

barbra blmack at gmail.com
Wed Jul 3 13:29:46 EDT 2013


No, no. I think that works.  And it is similar to what I would end up
having to do anyway, its just triggering at the time of pulling it down
instead of when something is added to a directory. I like it.

-barbra




On Wed, Jul 3, 2013 at 1:20 PM, Harlo Holmes <harlo.holmes at gmail.com> wrote:

> Please let me know if this doesn't make sense...
>
> there is a script that wraps the entire process described here:
> 1) pulls new media directly from a repository (whether that is Google
> Drive or Globaleaks or whatever we decide, or any combination of those
> options)
>
> 2a) if that media object contains the submitter's public key and base
> image, then that creates a Source object in the database
>
> 2b) if that media object contains an image or a video, then it does 2
> things:
>     - it sends the media file through the J3M parser
>     - it creates a Submission object in the database, which contains the
> following info:
>
>    - "asset_path": the root directory the media has been placed into
>    -  "file_name": the name of the file,
>    -  "date_admitted": long, unix time (barbra suggested some fixes to
>    sync timezones, which i can get to,
>    -  "j3m": the name of the j3m data (as generated by the J3M parser,
>    which is in the same directory),
>    -  "_id": the ID in the database
>
> From there, the script should be extended to hook into Barbra's mechanism
> for indexing and creating derivative objects.  Meaning, this script could
> pass Barbra's script the Submission object (see
> https://github.com/harlo/InformaCam-Service/blob/master/scripts/py/InformaCamModels/submission.py)
> which already inherits methods for saving itself into couchdb, and updating
> itself.  Or Barbra could create a derivative object (probably also
> extending the asset superclass?) and take it from there.
>
> This doesn't have to be triggered by this watcher script, though-- it can
> be decoupled however it makes sense...
>
>
> On Wed, Jul 3, 2013 at 1:04 PM, David Oliver <david at guardianproject.info>wrote:
>
>> OK, just so I understand this in Country-boy English,  the suggestion is
>> that Svetlana "wrap" her j3m-ifier tool in a script that, when j3m-ifier
>> completes on the provided work, will invoke ANOTHER script that will insert
>> the data into the CouchDB appropriately.
>>
>> I like that approach.  j3m-ifier as a tool itself remains unconcerned
>> about the analytics portion - it just knows files and file systems.
>> Instead, a script "wraps" the tool and simply invokes something that DOES
>> know how to deal with the analytics.
>>
>> OK, as to that SECOND script - the one invoked by j3m-ifier - *who is
>> going to write that*?  I'm assuming it will be a shell script?
>>
>> David M. Oliver | david at g <david at olivercoady.com>uardianproject.info |
>> http://g <http://olivercoady.com>uardianproject.info | @davidmoliver | +1
>> 970 368 2366
>>
>>
>> On Wed, Jul 3, 2013 at 12:54 PM, barbra <blmack at gmail.com> wrote:
>>
>>> So, I spoke with Harlo about this....:
>>>
>>> 1. If Svetlana can have the J3Mifer place the derivative images and such
>>> inside a sub-directory of the newly submitted submission directory, then
>>> there would be a script to monitor for any new additions to the parent
>>> storage directory. A new "record" would be created in couch based on
>>> extracted json. I would be able to generalize enough (without having to
>>> know the directory and image names in advance) if the derivatives were
>>> placed in this sub-directory.
>>>
>>> 2. Or, we run a separate script, at the same time as the J3Mifier
>>> process, that takes that same directory name + image name, and creates the
>>> appropriate records in couchdb.
>>>
>>> The general script that creates the appropriate derivatives record I
>>> have already created.
>>>
>>> I prefer #1 as it completely removes the ties to the J3Mifier. And the
>>> script cannot be run anyway until the J3Mifier is complete, as it relies on
>>> the json files that gets generated. And with #2, anyone wanting to use the
>>> analyzer would have to tie the processes of the j3mifier to the analyzer
>>> API. (and maybe that's ok too...)
>>>
>>> But I think Harlo wanted to go with #2 in the short term, just to focus
>>> on getting it out the door. I'm not sure how much tweaking it would take on
>>> Svetlana's part to do the sub-directory arrangement. So, for next week I
>>> think we just use dummy data and decide between #1 and 2 the week after??
>>>
>>> -barbra
>>>
>>>
>>>
>>> On Wed, Jul 3, 2013 at 12:33 PM, David Oliver <
>>> david at guardianproject.info> wrote:
>>>
>>>> Thank you for your time yesterday describing how the "analytical"
>>>> components work. (I'm hesitating to use the word "server" anymore).
>>>>
>>>> Remembering that I am TOTALLY focused on seeing a demonstration of
>>>> these analytics, but limited to (a) doing a search, (b) displaying a list
>>>> of results, and (c) viewing a single result, my understanding is that you
>>>> will be "ingesting" into your system images and metadata that are prepared
>>>> for you by "j3m-ifier".
>>>>
>>>> From my recent discussion with Svetlana, I learned that j3m-ifier
>>>> simply takes a file name and a directory, and "unpacks the package" into
>>>> that directory.
>>>>
>>>> *My questions to you*:
>>>>
>>>> 1. Do you currently have a "data ingestion" process? (non-manual)
>>>> 2. How does your analytics system know that new data is available to
>>>> "ingest"?
>>>> 3. Is the plan to have a "cron job" that periodically runs through the
>>>> file system looking for new data?
>>>> 4. OR, did you plan to have the j3m-ifier "tell you" that new data was
>>>> ready?
>>>>
>>>> I am trying to figure out who "owns" the (automated) process of getting
>>>> data into your analytics system.
>>>>
>>>> Dave
>>>>
>>>>
>>>> David M. Oliver | david at g <david at olivercoady.com>uardianproject.info |
>>>> http://g <http://olivercoady.com>uardianproject.info | @davidmoliver | +1
>>>> 970 368 2366
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mayfirst.org/pipermail/ssc-dev/attachments/20130703/6896df38/attachment-0001.html>


More information about the Ssc-dev mailing list